580 research outputs found

    Redescription Mining and Applications in Bioinformatics

    Full text link
    Our ability to interrogate the cell and computationally assimilate its answers is improving at a dramatic pace. For instance, the study of even a focused aspect of cellular activity, such as gene action, now benefits from multiple high-throughput data acquisition technologies such as microarrays, genome-wide deletion screens, and RNAi assays. A critical need is the development of algorithms that can bridge, relate, and unify diverse categories of data descriptors. Redescription mining is such an approach. Given a set of biological objects (e.g., genes, proteins) and a collection of descriptors defined over this set, the goal of redescription mining is to use the given descriptors as a vocabulary and find subsets of data that afford multiple definitions. The premise of redescription mining is that subsets that afford multiple definitions are likely to exhibit concerted behavior and are, hence, interesting. We present algorithms for redescription mining based on formal concept analysis and applications of redescription mining to multiple biological datasets. We demonstrate how redescriptions identify conceptual clusters of data using mutually reinforcing features, without explicit training information.

    Towards Neural Numeric-To-Text Generation From Temporal Personal Health Data

    Full text link
    With an increased interest in the production of personal health technologies designed to track user data (e.g., nutrient intake, step counts), there is now more opportunity than ever to surface meaningful behavioral insights to everyday users in the form of natural language. This knowledge can increase their behavioral awareness and allow them to take action to meet their health goals. It can also bridge the gap between the vast collection of personal health data and the summary generation required to describe an individual's behavioral tendencies. Previous work has focused on rule-based time-series data summarization methods designed to generate natural language summaries of interesting patterns found within temporal personal health data. We examine recurrent, convolutional, and Transformer-based encoder-decoder models to automatically generate natural language summaries from numeric temporal personal health data. We showcase the effectiveness of our models on real user health data logged in MyFitnessPal and show that we can automatically generate high-quality natural language summaries. Our work serves as a first step towards the ambitious goal of automatically generating novel and meaningful temporal summaries from personal health data.Comment: 5 pages, 2 figures, 1 tabl
    • …
    corecore